巴西专利BR112013019031B1 method and apparatus for comparing a first image with a second image, and a system for comparing ima

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
METHOD AND APPARATUS TO COMPARE A FIRST IMAGE WITH A SECOND IMAGE, AND, SYSTEM A method for comparing a first image with a second image is provided. The method comprises identifying first key points in the first image and second key points in the second image and associating each first key point with a corresponding second key point in order to form a corresponding key point match. For each pair of first key points, the method also involves calculating the distance between them to obtain a corresponding first length. Similarly, for each pair of second key points, the method involves calculating the distance between them to obtain a corresponding second length. The method further comprises calculating a plurality of distance proportions; each distance ratio is based on a length ratio between a selected one between a first length and a second length and a corresponding selected one between a second length and a first length, respectively. The method also includes calculating a distribution of statistics of a plurality of distance proportions and generating a model function expressing a distribution of statistics of additional distance proportions corresponding to a random selection of points (...).
公开号:BR112013019031B1
申请号:R112013019031-0
申请日:2011-01-25
公开日:2021-03-09
发明作者:Giovanni Cordara；Gianluca Francini；Skjalg Lepsoy；Pedro Porto Buarque De Gusmao
申请人:Telecom Italia S.P.A.；
IPC主号:

专利说明:

Knowledge of the Invention Field of the Invention
[001] The present invention relates to the field of image analysis analysis. Description of the related technique
[002] In the field of image analysis, a common operation provides to compare two images in order to find a relationship occurring between them in the case that both images include at least a portion of the same scene or of the same object.
[003] Among a large number of applications, image comparison is of the utmost importance for calibrating video cameras belonging to a multiple camera system, to confirm the movement occurring between two frames of a video snapshot, and for the recognition of an object within an image (eg, a figure). The latter application is now taking on more and more importance due to the recent development of object recognition algorithms specifically designed to be employed in the so-called visual search engines, ie, automated services that, starting from a figure, are able to identify the object (s) drawn on it and provide information related to the identified object (s). Examples of well-known services of this type include Google Goggles, Nokia Point & Find, and kooaba Smart Visuals. An object recognition application provides to compare a first image - in jargon, referred to as “inquiry image” - describing an object to be recognized with a plurality of reference images, each describing a respective known object; this allows a comparison to be made between the object represented in the inquiry image and the objects represented in the reference images.
[004] Reference images are typically stored in an appropriate reference database. The greater the number of reference images included in the database, the greater the number of comparison operations to be performed. In some cases, the reference database can become very large, negatively affecting the efficiency of the object recognition process. For example, in this case, object recognition is explored in an online shopping scenario, characterized by the fact that each reference image corresponds to an item offered by an online store (eg, the figure on a cover) book cover, a DVD cover and / or a CD cover), the number of reference images may exceed a few million units. Furthermore, in order to efficiently manage such a huge amount of data, the comparison operations must be carried out by a processing unit supplied with sufficient processing power.
[005] In the last decade, different algorithms have been proposed to reduce the time required to perform object recognition. These algorithms provide for heavily reducing the number of reference images that are candidates for including the object represented in the inquiry image.
[006] A very efficient way to perform comparison operations between two images provides to select a set of points - in jargon, referred to as key points - in the first image and then matching each key point in the set to a corresponding key point in the second image . The selection that the point of the first image has to become a key point is made taking into account local characteristics of the area of the image involving the point itself. In this regard, see “Distinctive image features from scale-invariant jeypoints” by David G. Lowe, International Journal of computer vision, 2004.
[007] If a correspondence between a key point of the first image and a corresponding key point of the second image is corrected, in the sense that both key points correspond to the same point of the same object (represented in both two images), such correspondence of key point is referred to as “inlier”.
[008] Conversely, if a correspondence between a key point of the first image and a corresponding key point of the second image is incorrect, in the sense that the two key points do not correspond to the same point on the same object, such a key point correspondence is referred to as an “outlier”.
[009] Therefore, in order to obtain a reliable result, a procedure capable of distinguishing inliers from outliers is advantageously performed after the key point that coincides has been determined.
[0010] Several examples of procedures of this type are already known in the art.
[0011] Most of the procedure used makes use of the RANSAC algorithm published in “Random sample consensus: A paradigm for model fitting sith applications to image analysis and automated cartography” by Martin A. Fischler and Robert C. Bolles, Communications of the ACM , 24 (6): 381-395, June 1981. However, this algorithm is time consuming, because it is based on an iterative approach.
[0012] The algorithms disclosed in “Fast geometric re-ranking for image-based retrieval” by Sam S. Tsai, Davide Chen, Gabriel Takacs, Vijay Chandrasekhar, Ramakrishna Vedantham, Radek Grzeszczuk, Bernd Girod, International Conference on Image Processing, October 2010, and in international patent application WO2009 / 130451 are based on the fact that the ratio between the distances of key points is an invariant under translation, rotation, and dimensioning. Additional algorithms of this type are also disclosed in “Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval” by Zhipeng Wu, Qianqian Xu, Shuqiang Jiang, Qingming Huang, Peng Cui, Liang Li, International Conference on Patterne Recognition, August 2010, pages 842 - 845, and in “Using Local Affine Invariants to IMprove Image Matching” by Daniel Fleck, Zoran Duric, 20th International Conference on Pattern Recognition, 2010, pages 1844 - 1847.
[0013] In addition, US 2010/0135527 Al discloses an image recognition algorithm including a key points based on comparison and a region based on color comparison. One method of identifying a target image using the algorithm includes: receiving an input on a processing device, the input including data related to the target image; performing a recovery step including recovering an image from an image database, and even the image is either accepted or rejected, designating the image as a candidate image; performing an image recognition step including using the processing device to perform an image recognition algorithm on the target and candidate images in order to obtain an image recognition algorithm output; and perform a comparison step including: if the output of the image recognition algorithm is within a pre-selected range, accepting the candidate image as the target image; and if the output of the image recognition algorithm is not within the pre-selected range, rejecting the candidate image and repeating the recovery, image recognition, and comparison steps.
[0014] US2010 / 0183229 Al refers to a method, system and product of computer program to match image. The images to be matched are represented by feature points and feature vectors and orientations associated with the feature points. First, supposed matches are determined using feature vectors. A subset of assumed matches is selected and the subset's topological equivalence is determined. The topologically equivalent subset of assumed matches is used to establish a movement estimation model. An orientation consistency test is performed on the supposed matches and the corresponding movement estimate transformation that is determined, to avoid an infeasible transformation. A coverage test is performed on correspondence that meets the orientation consistency test. Candidate matches that do not cover a significant portion of one of the images are rejected. Final match images are provided in order to decrease match, in the case of multiple images meeting all test requirements. Summary of the Invention
[0015] The Applicant has found that the known approaches mentioned above for implementing object recognition services are affected by several drawbacks. In particular, these approaches are time-consuming, based on iterative procedures and / or requiring an enormous amount of data to be processed.
[0016] The Applicant addressed the problem of how to improve these approaches in terms of time consumption and amount of data to be processed.
[0017] In particular, the Applicant addressed the problem of providing a method for comparing images that is reliable in terms of data processing and performs well in terms of time consumption.
[0018] The Applicant found that starting from display data a set of key points generated in a first image (inquiry image) and associated to a corresponding set of key points generated in a second image (reference image) in order to to form a corresponding set of key point matches, a method for comparing the image according to the present invention can include a main phase and two subsequent optional phases.
[0019] The main phase is applied after the generation of the key point matches, and provides to statistically process the key point matches and consequently confirming, through a geometric consistency check, if the inquiry image and the reference image can represent the same object or not. More in detail, after the generation of a model function expressing a statistical distribution of the incorrect matches (outliers), a good fit test is performed in order to decide whether the reference image contains a view of an object present in the image of inquiry.
[0020] If so, the method is able to compute a score to be used to classify the effective similarity between the object depicted in the reference image and the one depicted in the inquiry image.
[0021] The second phase allows you to confirm how many key point matches are inliers between the complete set of key point matches.
[0022] This phase can be advantageously performed to increase the precision in visual research applications.
[0023] The third phase allows specifically to identify which key point correspondence are inliers, and which key point correspondence are outliers.
[0024] Such a phase can be advantageously performed in some particular applications, such as augmented reality.
[0025] More specifically, according to one aspect of the present invention, it relates to a method for comparing a first image with a second image. The method comprises: identifying first key points in the first image and second key points in the second image and associating each first key point with a corresponding second key point in order to form a corresponding key point correspondence. For each pair of first key points, the method also comprises calculating the distance between them for a corresponding first length. Similarly, for each pair of second key points, the method involves calculating the distance between them to obtain a corresponding second length. The method further comprises calculating a plurality of distance ratios; each distance ratio is based on a length ratio between a selected one between a first length and a second length and a correspondent selected one between a second length and a first length, respectively. The method also includes calculating a distribution of statistics for a plurality of distance ratios and generating a model function expressing a distribution of statistics for additional distance ratios corresponding to a random selection of key points in the first and second images. The method includes comparing the mentioned statistical distribution for a plurality of distance ratios with the mentioned model function, and confirming that the first image contains a view of an object represented in the second image based on the mentioned comparison.
[0026] In accordance with an embodiment of the present invention, the method includes arranging a distribution of a plurality of distance ratios in the form of a histogram having a plurality of ordered bins each corresponding to a respective range of distance ratio values; the histogram lists for each bin a corresponding number of distance ratios of a distribution having values within the respective range. For each bin, the method also includes generating a corresponding model probability corresponding to the integral of the model function on the mentioned bin. The mentioned comparison of a distribution of a plurality of distance ratios with the mentioned model function includes comparing the histogram with the model's probabilities.
[0027] Preferably, the mentioned comparison of the histogram with the model probabilities comprises performing a Pearson chi-square test.
[0028] Advantageously, the aforementioned calculation of the distance ratios provides for calculating the logarithm of the length ratios.
[0029] According to an embodiment of the present invention, the method further comprises estimating a number of incorrect key point matches (an incorrect key point match is formed by a first and a second key point that do not correspond to the same point in a same object represented in the first and second images). The mentioned estimate of the number of incorrect key point matches includes initializing a weighting parameter to an initial value and repeating: a) weighting the model probabilities with the weighting parameter, and b) increasing the weighting parameter value, up to the value of at least one weighted model probability reaching the number of distance ratios listed by the histogram in the bin corresponding to the model probability. The method also comprises determining the number of incorrect key point matches based on the last value assumed by the weighting parameter.
[0030] According to one embodiment of the present invention, the method further comprises estimating a number of correct key point matches (a correct key point match is formed by a first and a second key point corresponding to the same point on the same represented in the first and second images). The mentioned estimate of the number of correct key point matches is based on the number of the first key point matches multiplied by a term equal to the square root of one minus the last value assumed by the weighting parameter.
[0031] According to a further embodiment of the present invention, the method further comprises calculating a matrix; each element of the matrix corresponds to a respective pair of matching key points and has a value corresponding to the difference between the value assumed by the histogram in the bin including the distance ratio of the respective pair of matching key points and the weighted model probability corresponding to the mentioned bin . The method further comprises finding the dominant eigenvector of the matrix, and identifying which point matches are most likely to be correct key point matches based on the aforementioned dominant eigenvector.
[0032] The mentioned identification of which key point matches are most likely to be correct key point matches includes identifying the elements of the eigenvector having the highest absolute values.
[0033] Another aspect of the present invention provides an apparatus for comparing a first image with a second image. The apparatus comprises a first identification unit configured to identify first key points in the first image and second key points in the second image, and an association unit configured to associate each first key point with a corresponding second key point to form a correspondence of key point. A first calculation unit is configured to calculate, for each pair of first key points, the distance between them to obtain a corresponding first length, while a second calculation unit is configured to calculate, for each pair of second key points, the distance between them to obtain a corresponding second length. The apparatus further comprises a third calculation unit configured to calculate a plurality of distance ratios; each distance ratio is based on a length ratio between a selected one between a first length and a second length and a correspondent selected one between a second length and a first length, respectively. The apparatus further comprises a fourth calculation unit configured to calculate the distribution of statistics for a plurality of distance ratios and a first generation unit configured to generate a model function expressing a distribution of additional distance ratios statistics corresponding to a selection randomization of key points in the first and second images. The apparatus comprises a first comparison unit configured to compare the mentioned statistical distribution for a plurality of distance ratios with the mentioned model function, and a confirmation unit configured to confirm that the first image contains a view of an object represented in the second image based on the mentioned comparison.
[0034] In accordance with an embodiment of the present invention, the apparatus further comprises a storage unit configured to arrange a distribution of a plurality of distance ratios in the form of a histogram having a plurality of ordered bins each corresponding to a respective interval distance ratio values; the histogram lists for each bin a corresponding number of distance ratios of a distribution having values within the respective range. The device also comprises a second generation unit configured to generate, for each bin, a corresponding model probability corresponding to the integral of the model function on the mentioned bin. The first comparison unit mentioned includes a second comparison unit configured to compare the histogram with the model's probabilities.
[0035] In accordance with an additional embodiment of the present invention, the apparatus comprises a first estimation unit configured to estimate a number of incorrect key point matches; an incorrect key point correspondence is formed by a first and a second key point that do not correspond to the same point of the same object represented in the first and second images. The first estimation unit includes an initialization unit configured to initialize a weighting parameter to an initial value and a weighting unit configured to repeat the operations: a) weight the model's probabilities with the weighting parameter, and b) increase the value of the weighting parameter, until the value of at least one weighted model probability reaches the number of distance ratios listed by the histogram in the bin corresponding to the model probability. The device also includes a determination unit configured to determine the number of incorrect key point matches based on the last value assumed by the weighting parameter.
[0036] Preferably, the device still comprises a second estimation unit configured to estimate a number of correct key point matches; the second unit of estimate mentioned is configured to estimate the number of correct key point matches based on the number of the first key point matches multiplied by a term equal to the square root of minus the last value assumed by the weighting parameter.
[0037] In accordance with a still further embodiment of the present invention, the apparatus further includes a fifth calculation unit configured to calculate a matrix; each element of the matrix corresponds to a respective pair of matching key points and has a value corresponding to the difference between the value assumed by the histogram in the bin including the distance ratio of the respective pair of matching key points and the weighted model probability corresponding to the mentioned bin . The apparatus further includes a find unit configured to find the dominant eigenvector of the matrix, and a second identification unit configured to identify which key point matches are most likely to be correct key point matches based on the dominant eigenvector.
[0038] A still further aspect of the present invention provides a system, which includes a key point detection unit configured to receive a query image and identify corresponding first key points in the mentioned image and a characteristic computing unit configured to describe the local aspect of the first key points mentioned through corresponding first local descriptors. The system also includes a reference database storing a plurality of reference images; for each reference image, the reference database also stores corresponding second key points and corresponding local second descriptors of the second key points. The system also includes a characteristic matching unit configured to compare, for each reference image of at least one group of reference images, the first local descriptors with the second local descriptors of the mentioned reference image, and consequently associate the first points keys with the second key points of the reference image mentioned to generate a corresponding set of key point matches. The system also includes an additional selection unit configured to select a subset of reference figures based on the comparisons made by the characteristic matching unit, and an optimization unit configured to calculate, for each pair comprising the inquiry image and the image subset reference number, the number of correct keypoint matches.
[0039] In accordance with an embodiment of the present invention, the system further comprises a visual search server and a plurality of terminals configured to exchange data with the visual search server over a network.
[0040] In accordance with one embodiment of the present invention, the visual search server includes the key point detection unit, the characteristics computing unit, the reference database, the characteristics matching unit, the selection and the optimization unit.
[0041] According to another embodiment of the present invention, the visual search server includes the reference database, the characteristic matching unit, the selection unit and the optimization unit, and each terminal includes a respective unit key-point detection system and a corresponding characteristic computation unit.
[0042] According to a still further modality of the present invention, the visual search server includes the reference database, and each terminal includes a respective key point detection unit, a respective characteristic computing unit, a respective characteristic matching unit, a respective selection unit, a respective optimization unit and a respective local database. Each terminal is configured to receive from the visual search server a respective set of second key points and corresponding second local descriptors of the second key points stored in the reference database, and the terminal's local database is configured to store the set received from second key points and second local descriptors; the mentioned stored set of second key points and second local descriptors corresponds to the reference images of at least one group of reference images. Brief description of the drawings
[0043] These and other characteristics and advantages of the present invention will be made more evident by the following description of some exemplary and non-limiting modalities of it, to be read in conjunction with the attached drawings, characterized by the fact that: Figure 1A illustrates a example in which key points of two images are associated with each other to form coincident key points; Figure IB illustrates the example of Figure 1A, in which only the inliers are represented; Figure 1C illustrates an LDR histogram corresponding to the example in Figure 1 A; Figure 2 illustrates the shape of an outlier model function according to an embodiment of the invention; Figures 3A - 3F illustrates several examples of LDR histograms generated from an image pair; Figure 4 illustrates an exemplary case in which the inquiry image in addition to addition, reference image represents the same planar object seen from different angles; Figures 5A and 5B illustrate two exemplary cases in which nearby planar objects are shown with moderate differences in viewing angles; Figure 6 shows an example of scaling the model probabilities to estimate the number of inliers according to an embodiment of the present invention; Figure 7A is a flow chart illustrating the main steps of the first phase of the method according to an embodiment of the present invention; Figure 7B is a flow chart illustrating the main steps of the second phase of the method according to an embodiment of the present invention; Figure 7C is a flow chart illustrating the main steps of the third phase of the method according to an embodiment of the present invention; Figure 8, schematically, illustrates a possible scenario characterized by the fact that the method according to a modality of the present invention can be explored to implement a visual research service; Figure 9A illustrates a system implementing a visual search service in accordance with an embodiment of the present invention; Figure 9B illustrates a system implementing a visual search service in accordance with an additional embodiment of the present invention; Figure 9C illustrates a system implementing a visual survey service according to a still further modality of the present invention, and Figure 10 is a flowchart illustrating the main steps of an automatic video camera calibration method according to a modality of the present invention. Detailed description of the exemplary embodiments of the invention
[0044] As already mentioned above, a comparison between two images foresees that key points of the first image coincide with corresponding key points of the second image. A key point correspondence is said to be correct (inlier) if the corresponding key points of the two images correspond to the same point of the same object (represented in both images); conversely, a key point match is said to be incorrect (outlier) if the two key points do not correspond to the same point on the same object. In the example illustrated in figure 1A, characterized by the fact that each image is a picture of the same object (a church), each key point correspondence is represented with a respective solid line. The correspondence of key points illustrated in the exemplary case of Figure 1A includes both inliers and outliers. The version of the same exemplary case in which the outliers were removed is instead represented in figure 1B.
[0045] In the following discussion of this description, a new method of image comparison will be presented. Starting from a set of key points generated in a first image - referred to as a query image, or simply inquiry - associated with a corresponding set of key points generated in a second image - referred to as a reference image - in order to form a corresponding set of key point matches, the proposed method includes a main phase and two subsequent optional phases: 1) The main phase is applied after the generation of key point matches, and provides for statistically processing key point matches and consequently confirming, through a geometric consistency check, whether the inquiry image and the reference image can represent the same object or not. More in detail, after the generation of a probabilistic model of the incorrect correspondences (outliers), a good fit test is performed in order to decide if the reference image contains a view of an object present in the inquiry image. If so, the method is able to compute a score to be used to classify the effective similarity between the object depicted in the reference image and the one depicted in the inquiry image. 2) The second (optional) phase allows you to confirm how many key point matches are inliers between complete set of key point matches. This phase can be advantageously performed to increase the accuracy in visual research applications. 3) The third phase (also optional) allows you to specifically identify which key point matches are inliers, and which key point matches are outliers. Such a phase can be advantageously performed in some particular applications, such as augmented reality.
[0046] In section 1 of this description, the properties of the particular statistic used in this method, and the concept of record distance ratio, both for correct and incorrect correspondence, will be introduced. The following three sections (Sections 2 - 4) disclose mathematical and statistical aspects of the three stages of the proposed method. Section 5 discloses the main steps of the three stages of the method. The last section (Section 6) is aimed at exemplary applications of the proposed method. Section 1 - The distance ratio statistic
[0047] That a set of N coinciding key points be considered
where xi contains the coordinates of the z-th key point in the inquiry image and yi, contains the coordinates of its coincident key point in the reference image. A pair (x1, y1) is called an inlier if the two key points are correctly matched. Conversely, a pair is called an outlier if the key points are not incorrectly coincident.
[0048] The proposed method makes use of the so-called record distance ratio (by abbreviation LDR) proposed in the document cited above by Tsai et al. :

[0049] The key points need to be distinct, i.e., xi t xj, yi, yj, and the LDR is undefined for i = j. LDR is a function of the length ratio, an invariant for similarities. Thanks to the presence of the logarithm operator, if the inquiry image is exchanged with the reference image (x becomes y and vice versa), the LDR inverts the signal.
[0050] Given a set of N matching key points (xi, yi) - including N key points xi in the query image and N corresponding key points yi in the reference image - there is a FVR COPY EQUALITY number of record distance reasons . The distribution of statistics for such record distance ratios is expressed in the form of a corresponding histogram, here referred to as “LDR histogram”. The LDR histogram will be denoted by the matrix h = [h1. . . hx] T. h is a frequency matrix that appears when counting the observed record distance ratios contained within each of the predefined K T1 intervals. . . TK, hereinafter referred to as bins. For example, such bins can be the 25 width intervals at 0.2 between the lower value of -2.5 and the upper value of 2.5, ie:

[0051] An example of an LDR histogram corresponding to the case of Figure 1A is illustrated in Figure 1C.
[0052] The main advantage of LDR is that it behaves differently for inlier and outlier pairs. For many image transformations (which govern how inliers behave) the LDR is restricted to one interval. For outliers the LDR extends outside such ranges and it has a distinctive probability density function that can be used for good fit tests.
[0053] LDR is a function of two pairs of generic key points, i.e., (xi, yi) and (xj, yi). Three possible conditions can occur: either both pairs are outliers, both are inliers, or one pair is an inlier while the other is an outlier. 1.1 - The LDR when both pairs are outliers
[0054] The matching process is not restricted by any knowledge about the geometry of the scene in the images - since such knowledge is not available before the correspondence is carried out. There is no mechanism to prevent error matches, even if the two images show the same objects. If the two images do not show the same or very similar objects, then any correspondence needs to be considered incorrect.
[0055] Even though the correspondence process is deterministic, the positions of the key points incorrectly coinciding are not predicted. In general, it is impossible to discover any geometric pattern for outliers, and there are no first principles from which such a pattern could be deduced. Therefore, the incorrect correspondence is considered as a random process, and the behavior of outliers is modeled through an appropriate density function, referred to as an outlier model function.
[0056] Definition of the outlier model function. Let's let A and B be rectangles. Suppose that xi, xj and A and yi, yj and B are points drawn at random, corresponding to the random variables Xi, Xj and Yi, Yj. Let the random variable C be the LDR C = ldr (Xi, Xj, Yi, Yj).
[0057] The outlier model function is a probability density function fc (c) for C.
[0058] The outlier model function can be expressed for two types of key point distributions: normal and uniform.
[0059] Coordinates of key points normally distributed. The assumption that key points are normally distributed leads to a simple formulation of the outlier model function, which is a good approximation of real cases.
[0060] This is supposed that the key points of the inquiry image are independent and identically distributed (iid), behaving like a random variable that is normally distributed with mean μ and variance (1/2) / I:

[0061] It is assumed that the coordinates were adequately sized so that the key points are distributed over the complete image (note that the variance is the same in the horizontal and vertical directions.) So, the difference between two key points has, of the same way, a normal distribution:

[0062] Suppose that the key points {Yn} in the reference image have the same statistics as {Xn} and that key point Xn is coincident with a key point Yn. So, the square distance ratio has an F distribution with (2, 2) degrees of freedom.
as shown, eg, in “An introduction to Mathematical Statistics and its Applications” by RJ Larsen and ML Marx, NewJersey, page 338, Prentice-Hall, second edition, 1986.
[0063] The probability density function F (2.2) is
characterized by the fact that the symbol for the random variable R in Equation 5 was replaced by S, for reasons of simplicity. Since the record distance ratio is being considered (and not the square distance portion), the square root and logarithm are applied to the random variable S = Rij2. In addition, in order to take into account different sizes of two images or for different distributions of the key points in the two images, the function is extended for such cases by multiplying the random variable by a parameter to correspond to the reasons for the standard deviation of the key points in the images. two images, ie:

[0064] These modifications for F (2,2) p.d.f. lead to the following outlier model function.
[0065] Outlier model function. Let two images have random {Xn} and {Yn} key points, all of which have a normal bivariate distribution with variances oX in the first image and ay2 in the second image. Let a2 be the reason for the variances,

[0066] Claimants have determined that the record distance ratio has a probability density function:

[0067] The outlier model function in Equation 7 is at the base of the proposed method. The shape of the outlier model function is illustrated in figure 2. It must be noted that this outlier model function does not take into account the aspect ratio of rectangular images, since the horizontal and vertical variances are assumed to be the same .
[0068] Figures 3A - 3F illustrate several examples, each showing a respective pair of images (inquiry image - reference image) taken from the Zurich Construction Image Database (consisting of 1005 images, in 5 views each of 201 buildings). The key points are indicated by circles, while the lines point to the positions of the points coinciding in the other image. For each image pair, the LDR histogram and the corresponding outlier model function are illustrated in the formulation of Equation 7. It should be noted that all key point correspondences need to be considered as outliers, since the images show different constructions . From these examples, it can be seen that the outlier model function closely matches the LDR histogram whenever all matches are outliers. 1.2 - LDR when both pairs are inliers
[0069] In general, the LDR histogram for the case where all key point matches are inliers is often very different from the LDR histogram for outliers. In a large number of practical cases, the LDR histogram for only inliers is narrower than the LDR histogram for only outliers, ie, it is equal to zero over a number of bins (specifically, the side ones) where the LDR histogram for outlier it is not zero.
[0070] Since associated points xi and yi in the two images are related through a mapping of the same point in the seen object, it is advantageous to consider the upper and lower limits of the LDR histograms instead of using probabilistic modeling.
[0071] The study performed here is limited to points on planar surfaces in the 3D scene, with the primary concern of recognizing objects with limited variations in depth. Planar surfaces approach the visible portion of many interesting objects in the images, such as buildings, books and signs.
[0072] Two images of points on a planar surface are related through an H homography,
where x and y are coordinates of the projections of the same point in two images. Inlier pairs on the same planar surface are therefore related through the same homography. The characteristics of LDR histograms for transformation and homographic purposes in general will now be disclosed.
[0073] Transformations in order. An end transformation is a special case of a homography

[0074] The distance ratio is confined to an interval given by the singular values of the matrix K of 2 - 2,

[0075] In this case the LDR is in the range

[0076] The width of the non-zero portion of the LDR histogram, therefore, depends on how much the transformations in order to deform the objects. For a similarity transformation, the two singular values are the same, such that the LDR histogram has only one non-zero bin. If the transformation in order compresses the lengths to a maximum of one third and expands to a maximum of a factor of 2, then the two singular values are 1/3 and 2, and the range for non-zero values of the LDR is

[0077] Homographs. Suppose xi, xj and yi, yj are related through a homography
as in Equations 8 and 9. The LDR is also in this case restricted to a range
where a is the largest and smallest number such that

[0078] For more practical cases of homographs, this interval is narrow in relation to the LDR histogram for outliers, mainly due to the nature of the characteristics that are employed. Features such as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features) are invariant for similarity transformations, but not for end transformations, let's leave homographs alone. This means that if the prospect of distortion is severe such that [- ln b, -ln a] could theoretically be wide, the key points that could produce extreme LDR values will not be associated with its characteristics and will have different descriptors. Consequently, the inlier histogram for correctly associated key points is likely to remain in a relatively narrow range.
[0079] Figure 4 illustrates an exemplary case in which the inquiry image and the reference image represent the same planar object (a rectangle) seen from very different angles (in the example in question, -75 and 0 degrees). The base diagram in Figure 4 represents an LDR histogram and an outlier model function calculated from the image para.
[0080] Figures 5A and 5B illustrate two exemplary cases in which planar nearby planar objects (building faces) are shown with moderate differences in viewing angles. The base diagrams in Figures 5 A and 5B represent the corresponding LDR histograms and outlier model functions. 1.3 - The LDR with pairs of both types
[0081] The third alternative provides that the pair xi, yi is an inlier and xj, yj is an outlier (or vice versa). In this case in the same way, it is assumed that the key points in an image are randomly distributed, as one cannot know in advance any pattern or geometric rule that restricts the location of key points contained in an unknown image.
[0082] Each key point can therefore be represented by a random variable, for example, with a normal distribution, as in Equation 3. The vector of difference between two key points is modeled as in Equation 4, since one is considered to be an inlier and the other to be an outlier, and there can be no correlation between them.
[0083] However, the F distribution of Equation 5 is not exactly maintained, since the numerator and denominator are not independent, contrary to the hypotheses for the F distribution. The key points in the case of an inlier / outlier pair are
where π is the mapping (although not known) of the inlier key point in one image in the other image. The random variable representing the square distance ratio would be in this case
where the numerator and denominator are clearly not independent, since both are functions of xj. Finding the probability density for the variable in Equation 13 is very difficult, but as far as the proposed method is concerned, this might not be necessary. The experience brought up assumes that, with a small error, it is possible to model the LDR histogram for these two cases (both inlier pairs as opposed to an inlier / outlier pair) with the same probabilities as the model: the outlier model function in Equation 7 Section 2 - Rejection of mismatched images (phase 1 of the proposed method)
[0084] The LDR histogram can be used to identify an object that is visible in an image (the inquiry image). Here, means of 'identification' find a reference image containing a view of an object represented in a query image among the reference images of a reference collection (the reference database). Phase 1 of the proposed method allows to identify objects without the need to explicitly detect the pairs of key points inlier between the inquiry image and the reference image.
[0085] Phase 1 of the proposed method is a geometry consistency check, which provides for making a binary decision between the hypotheses: H0: The reference image does not correspond to that of the inquiry; Hi. The reference image corresponds to that of the inquiry.
[0086] The H0 hypothesis represents the expected state of things: expected state of things: it is known that almost all reference images do not contain vision of the object in the inquiry. A certain amount of evidence is needed in order to reject H0 and accept Hi (the exceptional event). This evidence is found in the relationship between the LDR histogram and the outlier model function. If the histogram fits well with the outlier model function, then the H0 hypothesis is accepted; if not, the Hi hypothesis is accepted.
[0087] In order to test such hypotheses, the proposed method provides for performing Pearson's chi-square test (see, for example, pages 402403 of the aforementioned work by R.J. Larsen et al.).
[0088] Before applying the Pearson test, the concept of "discrete outlier model function" is introduced.
[0089] Let the bins, ie the intervals for LDR values used to compose the LDR histogram, be denoted by Tk, k = I,. ,., K. The discrete outlier model function assigns probability values to each of the K bins,
such that in each k-th bin the value is equal to the integral of the outlier model function over that bin,
and characterized by the fact that each p (k) value is called “model probability”. For uniformity of notation, the probabilities of the model will be considered as elements of a sequence pk:

[0090] Pearson's chi-square test will be performed between the LDR histogram and the outlier model function in a discrete way.
[0091] Pearson test. At the α level of significance, the Hi hypothesis is accepted if
where n = N (Nl) / 2 is the total number of observations used to construct the LDR histogram, ie, the number of key point matching pairs (xi, yi), (xj, yj). The limit X21-α, k-1 is 100 (l-α) percentage of the chi-square distribution with K - 1 degrees of freedom.
[0092] Acceptance of the Hi hypothesis means that the reference image is a candidate to represent an object in the inquiry image. The margin by which the limit is exceeded can be used as a measure of similarity between the two images:
see Equation 15. The (index of) of the reference image that was the largest p can be selected as the identity for the object in the inquiry image.
[0093] If more than one reference images have a large p, then either the inquiry image shows several objects present in all the mentioned reference images, or the reference images in reverse through knowledge of the identification task at hand .
[0094] The α parameter is the probability of accepting a wrong reference image, in the event that the LDR histogram actually originates from a source with the outlier model function as probability density. Section 3 - Estimation of the number of inliers (phase 2 of the proposed method)
[0095] It is often interesting to know the number of inliers that are present in a set of correspondences of associated key points. Such a number may be useful in its own right, or it may be necessary to separate the inliers from the outliers, as will be shown in section 4.
[0096] This number can be estimated by exploring the LDR histogram. Since key point matching falls into two distinct categories, the histogram is made up of two terms called histogram components, one for each category. The histograms for inliers and outliers are very different, and in fact this difference is useful for separating the two components and expressing their relative weightings.
[0097] As previously argued, if key point matches are outliers then the histogram looks like the outlier model function. Furthermore, if the histogram resembles the outlier model function, then the pairs are outliers. More generally, if the histogram can be decomposed into two terms where a term looks like the outlier model function, then that term is due to outliers. This principle will be used to conjecture the number of outliers and even to identify them.
[0098] A pair of key point matches is an inlier if both (xi, yi) and (xj, yj) are correctly associated. If one or both of the key point matches are correctly associated, then the key point match pair is an outlier. Pin denotes the probability that a pair of key point matches contains only inliers and Pout the probability that at least one of the elements in the pair is an outlier. Let z be the LDR value for a pair of key point matches, and let p (z | in) and p (z | out) denote the conditional probability densities. The conditional probability for outliers is assumed to be the outlier model function in Equation 7,

[0099] So the total probability density has the form

[00100] This equation corresponds to the decomposition of the LDR histogram into two terms
where hk denotes the LDR histogram, dk is its component due to inliers, and gk is the component due to outliers. The method for estimating the number of inliers is based on the assumption that the outlier component gk is well approximated by its expected value, which leads to
where the probability of model Pk is the integral of the outlier model function over the k-th interval of bin and n = N (Nl) / 2 is the number of key point correspondence pairs used to construct the LDR histogram. There are two unknown quantities in Equation 19: the outlier probability Pout and the inlier dk component. Since the inlier component must be non-negative, Equation 19 can be rewritten as
which eliminates the inlier component from the equation. We assume that the inlier component is zero over some intervals, as argued in section 1, so over that interval the outlier component needs to be equal to the histogram values. This means that the probability of outlier Pout must be large enough to make the difference in Equation 20 reach the lower limit 0 for some bin with index k. Therefore, the research is carried out to find the largest possible value of the outlier probability in a small set of predefined values.
[00101] Search for outlier probability. Let hk, k = 1,.,., K denote the bins in the LDR histogram. Let pk denote the model's probabilities, and let n denote the number of key point match pairs that are used to construct the histogram. Let B = {βi,.,., ΒL} c [0, 1] be a set of predefined values eligible for Pout. Applicants have determined that the estimated probability that a matching key point pair contains at least one outlier is

[00102] Probability 1 - Pout is the relative fraction of inliers between all pairs of key point matches (xi, yi), (xj, yj). In order to obtain the number of inlier pairs (xi, yi), the number N of key point pairs and the number n = N (Nl) / 2 of pairs of key point correspondence have to be considered, since the histogram is made as long as all pairs (xi, y), (xj, yj) such that i <j. If the number of inlier pairs is denoted per m, then the fraction of key point pairs consisting of inliers is

[00103] Being based on estimates and assumptions the similar on distributions, Equation 22 has a low degree of precision. Therefore, the approach approaching m is preferable.
[00104] The estimated number of pairs of inlier key points is then

[00105] Figure 6 shows an example of dimensioning the model's probabilities to estimate the number of inliers. In this example, the same object (a house) is represented in two images. The base diagram shows the HDR LDR histogram (drawn with a solid line) and the estimated outlier histogram (drawn as a dashed dotted line). In this case, the number of inliers has been estimated to be one third of the total number of key points. Section 4 - identification of the most likely inliers (phase 3 of the proposed method)
[00106] After the preceding steps have been completed, all necessary quantities are available to determine the inlier histogram component,
(see Equation 19). This component can be used to separate inliers and outliers, as shown in this section.
[00107] Each pair of key point matches corresponds to a record distance ratio value, and the inlier histogram component expressed how likely it is that the pair contains inliers. This information is used to formulate the probability function with a binary value for each of the key points as parameters; the value 1 means that the pair is an inlier, 0 means that it is an outlier. With a certain number of inliers as a constraint, the parameters that maximize this probability function will indicate the most likely set of inliers.
[00108] Let N be the number of key point matches, and let u be a binary vector of N elements

[00109] Elements with a value of 1 indicate that the corresponding key point pairs are inliers, those with a value of 0 indicate outliers. A procedure like the one in the previous section produces an estimate of the number of m inliers (Equation 22), it is only possible to add the constraint

[00110] The LDR in Equation 2 is known for each pair of key point matches,

[00111] Ideally, if someone had known the conditional probability density for inliers, someone could assign a probability value to any hypothesis of inlier sets, simply adding up all the probabilities for the inliers,

[00112] Since u is binary, this sum can be written

[00113] The binary vector u that maximizes L in Equation 27 under the restrictions of Equations 23 and 24 represents the most likely set of m inliers.
[00114] Equation 27 needs to be modified to lead to a practical algorithm. In the absence of a closed shape for the inlier probability density p (z | in), it is replaced by the inlier histogram component d of Equation 19. In order to see this passage, this is useful for introducing a quantization operator q
which produces the index for the central bin (among all Zi bins, ■■■, ZK) this is close to the z value. This allows for the approximation

[00115] The following equation shows that this approximate value of a probability is proportional to the expected value for the inlier component of the LDR histogram:
where the proportionality constants are: n, the total number of key point match pairs; Pin, the probability that both pairs in a coupling are inliers; and δ, the width of a bin.
[00116] The ideal probability function in Equation 27 can now be replaced by
where the factors in Equation 31 have been omitted, as they do not move the solution that maximizes G (u).
[00117] In the form of a matrix the above equation becomes:
where matrix D contains values from the inlier histogram with
as element i, j.
[00118] The inlier identification problem can now be expressed

[00119] When the matrix D has total classification, the optimum is very difficult to compute. A route to an approximate solution is provided in “Improving shape retrieval by spectral matching and met similarity” by A. Egozi, Y. Keller and H. Guterman, IEEE Transactions on Image Processing, vol 19, pages 1319-1326, May 2010 , which confronts a problem similar to that of Equation 33. Here, binary optimization is replaced by a simpler one
where the solution is the dominant eigenvector of D (the eigenvector that corresponds to the largest eigenvalue). The elements of this vector are often either close to zero or close to a maximum sign value (such that the highest values are positive). Then, the eigenvector w is used to obtain a binary vector u, taking its m largest elements (m is the estimated number of inliers of Equation 22), according to the following relationships;
characterized by the fact that sort (w, 'descend') is the Matlab function (by MathWorks) that orders the elements of the w matrix in descending order, generating a corresponding ordered w matrix. The sort (w, 'descend') function generates an additional matrix i whose elements are the indexes of the elements of matrix w, ordered as in matrix w.
[00120] The result u is a good approximation of the inlier set, as can be seen in the practical experiments.
[00121] Fast eigenvector computation. The estimated inliers correspond to the m largest elements in the dominant eigenvector of D. The objective is to maintain the computation of the eigenvector as fast as possible, also at the expense of some precision. Methods for finding the dominant eigenvector are known in the art (see, for example, Rayleigh's power iteration and quotient iteration published in “Numerical Linear Algebra” by L. Tredethen and D. Bau, The Society for Industrial and Applied Mathematics, 1997).
[00122] Both methods are iterative and are based on an initial estimate of the dominant eigenvector, and an approximate and ready candidate is the middle column that comes close to a non-negative input matrix such as D. Section 5 - Main steps of the method
[00123] The main steps of the previously described method will now be illustrated in Figures 7A - 7C.
[00124] Specifically, Figure 7A is a flow chart illustrating the main steps of the first phase of the method according to an embodiment of the present invention. As already mentioned above, the first phase of the method provides to confirm whether a first image and a second image represent the same object or not.
[00125] It is assumed to start with a pair of images to be compared, ie, a first image (the inquiry image) characterized by the fact that it understands N key points xi and a second image (the reference image) characterized by the fact that it understands N key points yi. Each key point xi in the inquiry image is associated with a corresponding key point yi in the reference image in order to define a corresponding key point correspondence (xi, yi).
[00126] The first step involves generating a distance ratio histogram from key point matches (xi, yi) using a distance ratio function that is invariant for similarities. For example, the Record Distance Ratio (LDR) histogram is generated from the correspondence of key points (xi, yi) using Equation 2 (block 702).
[00127] A corresponding outlier model function is then generated using a probability density function associated with a distance ratio function used in the first step, for example, using Equation 7 (block 704) in the case of a function of Record Distance Ratio (LDR) as defined in Equation 2.
[00128] The next step consists of placing the previously calculated outlier model function (block 706) in a discrete mode, for example, applying Equation 14 for the previously calculated outlier model function in order to obtain a discrete version of the same.
[00129] The LDR histogram is then compared for the outlier model function in a discrete form using Pearson's test (Equation 15) to confirm that all key point matches are to be considered random (block 708).
[00130] Specifically, in this case, the Pearson test result implies a good fit of the LDR histogram with the outlier model in a discrete form (Y branch of block 710), this means that all or almost all correspondences of key points are outliers, and so the reference image does not show any object represented in the inquiry image. Then, the method ends.
[00131] Conversely, if the Pearson test result implies that the LDR histogram does not fit the outlier model in discrete form (output branch N of block 710), this means that many of the key point matches are likely to be inliers, and so the reference image can probably show an object already represented in the inquiry image. In the latter case, if desired, the method proceeds to the second stage.
[00132] Figure 7B is a flow chart illustrating the main steps of the second phase of the method according to an embodiment of the present invention. This phase, based on the “search for outlier probability” previously described, allows you to confirm how many key point matches are inliers among the set of key point matches.
[00133] The first step provides to initialize the factor β of Equation 21 (block 712); for example, β is initialized to zero.
[00134] The β factor is used to weight the outlier model function in a discrete way to be compared with the LDR histogram. The objective of this step is to estimate the probability that any given pair of key point matches will contain at least one outlier through Equation 21. Specifically, since nβpk is calculated for each k (block 714) and the term β is updated by adding a predetermined amount for the previously assumed value (block 716), the comparison is made between the previously calculated nβpk and the corresponding hk (for each k).
[00135] If, for each k, hk turns out to be greater than the previously calculated nβpk (output branch Y of block 718), meaning that the outlier model function in weighted discrete form is lower than the LDR histogram, a new calculation of nβpk is performed exploring the updated value of β (return to block 714).
[00136] When instead nβpk reaches hk for at least one k (output branch N of block 718), this means that portions of the outlier model function in weighted discrete form (specifically, the lateral ends of it) reached - or exceeded - corresponding portions of the LDR histogram. Therefore, according to Equation 21, the Pout probability that the key point correspondence pairs contain at least one outlier is estimated to be equal to the last value assumed by β (block 720). The estimated Pin probability that the key point correspondence pairs contain at least one inlier is thus being set equal to 1 - Pout (block 722).
[00137] The number m of inliers is then calculated using Equation 23 (block 724).
[00138] At this point, if desired, the method proceeds to the third phase.
[00139] Figure 7C is a flow chart illustrating the main steps of the third phase of the method according to an embodiment of the present invention. This phase specifically allows identifying which key point matches are inliers, and which key point matches are outliers, solving the maximization problem 36.
[00140] The first step foresees to build the inlier matrix D as represented in relation 34 (block 726).
[00141] The maximization problem 36 is then solved to find the dominant eigenvector w of the inlier matrix D (block 728).
[00142] Finally, an approximation of the inlier set is calculated using the eigenvector w previously found in relations 37 and 38 (block 730).
[00143] The steps of the method described in this section can be performed by appropriate processing units, whose structure and function depends on the specific field of application for which they are intended. For example, each processing unit can be a hardware unit specifically designed to perform one or more steps in the method. Furthermore, the steps of the method can be carried out by a programmable machine (eg, a computer) under the control of a corresponding set of instructions. Section 6 - Some exemplary applications of the method
[00144] Figure 8, schematically, illustrates a possible scenario characterized by the fact that the method previously described can be explored to implement a visual research service according to the modalities of the present invention. The scenario in Figure 8 - identified with reference 800 - is structured according to a client - server configuration, characterized by the fact that a visual search server 810 is configured to interact with a plurality of terminals 820 to exchange data over a network. external 830 network, such as a MAN, a WAN, a VPN, the Internet or a telephone network. Each 820 terminal can be a personal computer, a notebook, a laptop, a personal digital assistant, a smart phone, or any electronic device capable of managing a digital image.
[00145] According to an embodiment of the present invention illustrated in figure 9A, all the main operations of the visual search service are performed by the visual search server 810.
[00146] A user of an 820 terminal requesting information related to an object represented in a figure, sends the mentioned figure (which becomes the inquiry image) to the visual search server 810 through an 830 network.
[00147] The visual search server 810 includes a server interface 902 adapted to interact with a network 830 to receive / transmit data from / to terminals 820, through the server interface 902, the visual search server 810 receives the image question to be analyzed.
[00148] The inquiry image is provided for the key point detection unit 904 configured to identify the key points included in the mentioned image.
[00149] Once the key points are generated, the local aspect of it is described by a 906 characteristic computing unit. This operation is performed by the 906 characteristic computing unit using known local descriptors, such as the Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Feature (SURF).
[00150] The visual search server 810 still includes a matching unit of characteristics 908 coupled with a reference database 910 storing the reference images to be exploited for image recognition. The comparison between the local descriptors extracted from the inquiry image and the local descriptors of the reference images stored in the reference database is carried out by the characteristics matching unit 908 using known techniques for comparing image characteristics, for example, with based on the Euclidean distance between descriptors. Feature matching unit 908 issues a corresponding list including, for each reference image in the reference database, a corresponding set of key point matches. This list can be emptied in the event that the objects represented in the inquiry images do not correspond to any object represented in any reference image.
[00151] Based on the list generated by the characteristics matching unit 908, the selection unit 912 selects the first q reference images that share the largest number of key point matches with the inquiry image. These reference images are supposed to be the best candidates for including an object represented in the inquiry image.
[00152] In accordance with an embodiment of the present invention, the visual search server 810 still includes an optimization unit 914 configured to implement the previously described method. The optimization unit 914 applies the method mentioned for the correspondence of key points corresponding to the set of q reference images selected by the selection unit 912: for each pair consisting of the inquiry image and the reference image of the set, the optimization unit 914 calculates the number of correct key point matches (inliers). This calculation is performed according to the first phase of the method, preferably according to the first two phases of the method (i.e., the phases illustrated in Figures 7A and 7B). If the third phase of the method illustrated in figure 7C is carried out as well (for example, when it is desired to obtain an indication of where the objects represented in the inquiry image are located in the reference images), the optimization unit 914 is capable of specifically identify which correspondence of key points are to be considered inliers. The reference images of the set that turn out to include a sufficient number of key points correctly coinciding with corresponding key points of the inquiry images are considered to include at least (a portion of) the same object represented in the inquiry image. These last reference images are then sent back to terminal 820 via the 830 network as a result of the visual search request, possibly ordered based on the number of inliers counted.
[00153] According to an additional embodiment of the present invention illustrated in figure 9B, the key point detection unit 904 and the characteristic computing unit 906 are included in terminals 820 instead of being included in the visual search server 810. In this case, instead of sending a query image to the visual search server 810, each terminal 820 is able to directly send the local descriptors generated locally from the inquiry image.
[00154] Compared to the previous modality, this solution requires the transmission of a smaller amount of data (the local descriptor instead of the entire inquiry image). Furthermore, according to this modality, the computational load to be managed by the visual search server 810 is reduced, allowing the latter to manage more image search requests at the same time.
[00155] According to a still further modality of the present invention illustrated in figure 9C, almost all the main operations of the visual search service are performed by the terminals 820, with the visual search server 810 that only stores the key points and the descriptors locations of the reference images, and sends selected subsets of them to the terminals based on the specific visual survey requested by the users of the terminals. For example, in case the terminal 820 is a smart phone equipped with a GPS system and the inquiry image is a picture captured with the camera of the smart phone itself, the selection of which key points and local descriptors are to be sent by the visual search server 810 can be based on the actual position of terminal 820; this solution can be advantageously exploited for some visual research services such as monument recognition services.
[00156] In order to be able to manage image comparison operations, terminal 820 is provided with a local reference database 916 and an updater unit 920, the latter being adapted to receive key points and descriptors locations transmitted by the visual search server 810 and consequently updates the previous one. It has to be appreciated that it is not strictly necessary to update the local reference database 916 each time an image comparison has to be performed, it is sufficient to explore the key points and the local descriptors already stored in it. For example, the local reference database 916 can be updated by the visual search server 810 only once a day.
[00157] Compared with the previous modalities, this solution is faster, since the amount of data to be transmitted is greatly reduced. Therefore, this solution is particularly suitable for augmented reality applications.
[00158] A still possible application of the proposed method is the automatic calibration of video cameras belonging to a stereo camera system. The purpose of calibration is to generate the so-called fundamental matrix, i.e., a matrix that describes the intrinsic and extrinsic parameters of the acquisition system. The intrinsic parameters describe the camera settings (eg, the focal length), while the extrinsic parameters describe the camera's position within the space.
[00159] As illustrated in the schematic flow chart of Figure 10, a first camera 1002 acquires a first image (block 1004), which is processed in order to identify corresponding first key points (block 1006). Once the first key points are identified, the local aspect of it is described through corresponding first local descriptors (block 1008). Similarly, a second camera 1010 acquires a second image (block 1012), which is processed in order to find corresponding second key points (block 1014). Then, the local aspect of those key points is described through corresponding second local descriptors (block 1016).
[00160] Comparing the first local descriptors with the second local descriptors, correspondences of key points between the first and second images are generated (block 1018). Then, applying the three phases of the method illustrated in Figures 7A - 7C, the correspondences of key points that are inliers are identified (block 1020).
[00161] Once the inliers have been identified, an iterative procedure is performed to estimate the fundamental matrix (block 1022) in order to find a new key point correspondence (block 1024). These operations can be performed following the procedure described and, “In defense of the Eight-Point Algorithm” by R. Hartley, IEEE Transactions on pattern analysis and machine intelligence, Vol 19, No. 6, June 1997. The new correspondences of key points are then processed again with the three phases of the method illustrated in Figures 7A - 7C in order to identify the inliers (block 1026). This procedure (i.e., the one corresponding to blocks 1022, 1024 and 1026) is repeated until the number of inliers is stable.
[00162] The foregoing description presents and discusses in detail various modalities of the present invention; however, several changes to the described modalities, as well as different modalities of the invention are possible, without departing from the scope defined by the appended claims.
[00163] For example, although in the present description reference was made to the record distance ratio (LDR), similar considerations apply if the histograms are constructed with a distance ratio difference, such as a flat distance ratio, if the logarithm; moreover, similar considerations apply if the histograms are constructed with multiples and / or powers of the record distance ratio.
[00164] Furthermore, there is nothing to prevent the distribution of statistics of distance ratios with a different representation of a histogram; in this case, the Pearson test must be replaced by an equivalent test compatible with the specific representation chosen.
[00165] Furthermore, the concepts of the present invention can be applied even if the widths of the bins of the histograms are different from each other.

权利要求:
Claims (17)
[0001]
1. Method for comparing a first image with a second image, characterized by the fact that it comprises: - identifying first key points in the first image and second key points in the second image; - associating each first key point with a corresponding second key point so as to form a corresponding key point correspondence; - calculate a plurality of distance ratios by selecting pairs of key point matches and, for each selected pair of key point matches, calculate a distance ratio between a first distance and a second distance or a distance ratio between the second distance and the first distance, the first distance being the distance between the first key points of said pair of key point matches and the second distance being the distance between the second key points of said pair of key point matches; - calculate a statistical distribution of the plurality of calculated distance ratios; - calculate a plurality of external distance ratios, randomly selecting sets of four key points, each set containing two randomly selected first key points and two second randomly drawn key points, and calculating the distance ratio between the distance between said first key points randomly drawn and the distance between said second key points drawn at random, or calculating the ratio of the distance between the distance between said second key points drawn at random and the distance between said first key points drawn at random; - generate a model function expressing a statistical distribution of the distance ratios outside the curve; - compare the distribution of statistics of the plurality of distance ratios with the model function, and - confirm that the first image contains a view of an object represented in the second image based on the comparison.
[0002]
2. Method according to claim 1, characterized in that it additionally includes: - arranging a statistical distribution of the plurality of distance ratios in the form of a histogram having a plurality of ordered bins, each corresponding to a respective range of values of distance ratio, the histogram listing for each bin a corresponding number of distance ratios of the statistical distribution having values within the respective range, and - for each bin, generate a corresponding probability model corresponding to the integral of the model function over the bin mentioned, where: - the step of comparing the statistical distribution of the plurality of distance ratios mentioned with the model function includes comparing the histogram with the probability models.
[0003]
3. Method according to claim 2, characterized by the fact that the comparison of the histogram with the probability models comprises performing a Pearson chi-square test.
[0004]
4. Method according to claim 1, characterized by the fact that calculating the distance ratios provides calculating the logarithm of the distance ratios.
[0005]
5. Method according to claim 2 or 3, characterized in that it additionally comprises estimating a number of incorrect key point matches, an incorrect key point match being formed by a first and a second key point that do not correspond to the same point of the same object represented in the first and second images, estimating the number of incorrect key point matches including: - initializing a weighting parameter to an initial value; - repeat operations a) and b): a) weight the model probabilities with the weighting parameter; b) increase the value of the weighting parameter, until the value of at least one weighted probability model reaches the number of distance ratios listed by the histogram in the bin corresponding to the model probability mentioned, and c) determine the number of key point matches incorrect based on the last value assumed by the weighting parameter.
[0006]
6. Method according to claim 5, characterized in that it additionally comprises estimating a number of correct key point matches, a correct key point match being formed by a first and a second key point corresponding to the same point in a same object represented in the first and second images, the estimated estimate of the number of correct key point matches being based on the number of the first key point match multiplied by a term equal to the square root of one minus square root of one minus the last value assumed by the weighting parameter.
[0007]
7. Method according to claim 6, characterized in that it additionally comprises: - calculating a matrix, each element of the matrix corresponding to a respective pair of coincident key points and having a value corresponding to the difference between the value assumed by the histogram in the bin including the distance ratio of the respective matching key point pair and the weighted model probability corresponding to the mentioned bin; - find a dominant eigenvector of the matrix corresponding to the largest eigenvector of the matrix, and - identify which key point matches are more likely to be correct key point matches based on the dominant eigenvector.
[0008]
8. Method according to claim 7, characterized by the fact that identifying which key point matches are most likely to be correct key point matches includes identifying the elements of the eigenvector having the highest absolute values.
[0009]
9. Apparatus for comparing a first image with a second image, characterized by the fact that the device comprises: - a first identification unit configured to identify first key points in the first image and second key points in the second image; - an association unit configured to associate each first key point with a corresponding second key point so as to form a corresponding key point correspondence; - a first calculation unit configured to calculate a plurality of distance ratios by selecting pairs of key point matches and, for each selected pair of key point matches, calculating a distance ratio between a first distance and a second distance or a ratio of distance between the second distance and the first distance, the first distance being the distance between the first key points of said pair of key point matches and the second distance being the distance between the second key points of said pair of key point matches ; - a second calculation unit configured to calculate a statistical distribution of the plurality of calculated distance ratios; - a third calculation unit configured to calculate a plurality of distance ratios outside the curve, randomly selecting sets of four key points, each set containing two randomly selected first key points and two second randomly drawn key points, and calculating the distance ratio between the distance between said first key points drawn at random and the distance between said second key points drawn at random, or calculating the ratio of the distance between the distance between said second key points drawn at random and the distance between said first key points drawn randomly; - a first generation unit configured to generate a model function expressing a statistical distribution of the distance ratios outside the curve; - a first comparison unit configured to compare the statistical distribution of the plurality of distance ratios with the model function, and - a confirmation unit configured to confirm that the first image contains a view of an object represented in the second image based on the Comparation.
[0010]
10. Apparatus according to claim 9, characterized in that it additionally includes: - a storage unit configured to arrange the statistical distribution of the plurality of distance ratios in the form of a histogram having a plurality of ordered bins each corresponding to a respective range of distance ratio values, the histogram listing for each bin a corresponding number of distance ratios of the statistical distribution having values within the respective range, and - a second generation unit configured to generate, for each bin, a corresponding probability model corresponding to the integral of the model function over the bin, in which: - the first comparison unit includes a second comparison unit configured to compare the histogram with the model probabilities.
[0011]
11. Apparatus according to claim 10, characterized in that it additionally comprises a first estimation unit configured to estimate an incorrect key point match number, an incorrect key point match being formed by a first and a second key point which do not correspond to the same point of the same object represented in the first and second images, the first estimate unit including: - an initialization unit configured to initialize a weighting parameter to an initial value; - a weighting unit configured to repeat operations a) and b): a) weighting the probability model with the weighting parameter; b) increase the value of the weighting parameter, - until the value of at least one weighted probability model reaches the number of distance ratios listed by the histogram in the bin corresponding to the model probability, and - a determination unit configured to determine the number of incorrect key point matches based on the last value assumed by the weighting parameter.
[0012]
Apparatus according to claim 11, characterized in that it additionally comprises a second estimation unit configured to estimate a number of correct key point matches, a correct key point match being formed by a first and a second key point that correspond to the same point of the same object represented in the first and second images, the second estimation unit being configured to estimate the number of correct key point matches based on the number of first key point matches multiplied by a term equal to the root square of one subtracted from the last value assumed by the weighting parameter.
[0013]
13. Apparatus according to claim 12, characterized in that it additionally comprises: - a fifth calculation unit configured to calculate a matrix, each element of the matrix corresponding and a respective pair of key points coinciding and having a value corresponding to the difference between the value assumed by the histogram in the bin including the distance ratio of the respective matching key point pair and the weighted probability model corresponding to the mentioned bin; - a find unit configured to find a dominant matrix vector corresponding to the largest matrix vector, and - a second identification unit configured to identify which key point matches are more likely to be correct key point matches based on the dominant eigenvector mentioned.
[0014]
14. System for comparing images, characterized by the fact that it includes: - a key point detection unit configured to receive an inquiry image and to identify the corresponding first key points in the image; - a characteristic computing unit to describe the local aspect of the first key points through corresponding first local descriptors; - a reference database storing a plurality of reference images, for each reference image, the reference database still storing corresponding second key points and corresponding local second descriptors of the second key points; - a characteristic matching unit configured to compare, for each reference image of at least one group of reference images, the first local descriptors with the second local descriptors of the mentioned reference image, and consequently associate the first key points with the second key points of the reference image to generate a corresponding set of key point matches; - a selection unit configured to select a subset of reference figures based on the comparisons made by the characteristic matching unit, and - an optimization unit configured to calculate, for each pair comprising the inquiry image and the reference image of the a subset, the number of correct key point matches, where the optimization unit includes the apparatus as defined in claim 12 or 13.
[0015]
15. System according to claim 14, characterized by the fact that it comprises a visual search server and a plurality of terminals configured to provide inquiry images to the visual search server over a network, in which: - the search server visual includes the unit of detection of key points, the unit of computation of characteristics, the reference database, the unit of correspondence of characteristics, the unit of selection and the unit of optimization.
[0016]
16. System according to claim 14, characterized in that it additionally comprises a visual search server and a plurality of terminals configured to provide inquiry images to the visual search server over a network, in which: - the search server Visual search includes the reference database, the characteristic matching unit, the selection unit and the optimization unit, and each terminal includes a respective key point detection unit and a respective characteristic computing unit.
[0017]
17. System according to claim 14, characterized by the fact that it additionally comprises a visual search server and a plurality of terminals configured to exchange data with the visual search server over a network, in which: - the visual search server includes the reference database, and each terminal includes a respective key point detection unit, a corresponding characteristic computing unit, a corresponding characteristic matching unit, a respective selection unit, a respective optimization unit and a respective local database, where: each terminal is configured to receive from the visual search server a respective set of second key points and corresponding second local descriptors of the second key points stored in the reference database, and the database terminal location is configured to store the received set of second key points and second local descriptors, the stored set of second key points and second local descriptors corresponding to the reference images of at least one group of reference images.

类似技术:

公开号 | 公开日 | 专利标题

BR112013019031B1|2021-03-09|method and apparatus for comparing a first image with a second image, and a system for comparing images.

Hong et al.2014|Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval

Sahillioǧlu et al.2011|Coarse‐to‐fine combinatorial matching for dense isometric shape correspondence

Ma et al.2013|Regularized vector field learning with sparse approximation for mismatch removal

Torresani et al.2008|Feature correspondence via graph matching: Models and global optimization

BR112014016301B1|2022-02-01|Method and apparatus for comparing a first image with a second image, system, and method for retrieving images

JP2015504215A5|2015-10-29|

GB2550567A|2017-11-29|Point Cloud Matching Method

Park et al.2003|Recognition of partially occluded objects using probabilistic ARG |-based matching

Dou et al.2014|Robust visual tracking base on adaptively multi-feature fusion and particle filter

Gao et al.2014|A robust and outlier-adaptive method for non-rigid point registration

Zhang et al.2020|Learning noise-aware encoder-decoder from noisy labels by alternating back-propagation for saliency detection

Sun et al.2017|Image matching via feature fusion and coherent constraint

Armiti et al.2014|Geometric graph matching and similarity: A probabilistic approach

Sun et al.2016|Progressive match expansion via coherent subspace constraint

Campbell et al.2020|Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization

Yang et al.2016|Robust image registration using adaptive coherent point drift method

Sharma et al.2020|Voxel-based 3D occlusion-invariant face recognition using game theory and simulated annealing

Zhang et al.2021|Affinity fusion graph-based framework for natural image segmentation

Medeiros et al.2018|Scalable image segmentation via decoupled sub-graph compression

Ehrhardt et al.2014|Statistical shape and appearance models without one-to-one correspondences

Abeysinghe et al.2010|Semi‐isometric Registration of Line Features for Flexible Fitting of Protein Structures

Vogt et al.2015|Facial landmark localization using robust relationship priors and approximative gibbs sampling

CN113920382B|2022-03-15|Cross-domain image classification method based on class consistency structured learning and related device

Shi et al.2019|A geometrical-information-assisted approach for local feature matching

同族专利:

公开号 | 公开日

JP5734460B2|2015-06-17|

CN103403739B|2017-06-13|

JP2014508349A|2014-04-03|

KR20130122662A|2013-11-07|

US20130308861A1|2013-11-21|

EP2668618A1|2013-12-04|

AR085030A1|2013-08-07|

EP2668618B1|2017-03-15|

KR101531618B1|2015-07-06|

WO2012100819A1|2012-08-02|

CN103403739A|2013-11-20|

BR112013019031A2|2017-03-28|

US9008424B2|2015-04-14|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

DE10251787A1|2002-11-05|2004-05-19|Philips Intellectual Property & Standards Gmbh|Detecting point correspondences in point quantities for fingerprint verification, involves determining potential association pairs with points from two point quantities and maximum number of pairs|

WO2006073647A2|2004-12-03|2006-07-13|Sarnoff Corporation|Method and apparatus for unsupervised learning of discriminative edge measures for vehicle matching between non-overlapping cameras|

KR100813168B1|2006-06-08|2008-03-17|삼성전자주식회사|Method for extracting object in digital image with shape prior and system for executing the method|

CN101398896B|2007-09-28|2012-10-17|三星电子株式会社|Device and method for extracting color characteristic with strong discernment for image forming apparatus|

GB0807411D0|2008-04-23|2008-05-28|Mitsubishi Electric Inf Tech|Scale robust feature-based indentfiers for image identification|

US8391615B2|2008-12-02|2013-03-05|Intel Corporation|Image recognition algorithm, method of identifying a target image using same, and method of selecting data for transmission to a portable electronic device|

US8401342B2|2009-01-16|2013-03-19|A9.Com, Inc.|System and method to match images using topologically equivalent correspondences|

US8199248B2|2009-01-30|2012-06-12|Sony Corporation|Two-dimensional polynomial model for depth estimation based on two-picture matching|

CN101567045B|2009-05-22|2011-09-07|北京大学|Accurate positioning method of human face characteristic points|

CN101777129B|2009-11-25|2012-05-23|中国科学院自动化研究所|Image matching method based on feature detection|

US20120011119A1|2010-07-08|2012-01-12|Qualcomm Incorporated|Object recognition system with database pruning and querying|

CN101937511B|2010-07-09|2012-07-25|中国人民解放军国防科学技术大学|Rapid image matching method based on stochastic parallel optimization algorithm|JP5896207B2|2011-11-24|2016-03-30|富士ゼロックス株式会社|Distribution evaluation apparatus, distribution determination apparatus, image processing apparatus, and program|

JP5848833B2|2012-01-02|2016-01-27|テレコム・イタリア・エッセ・ピー・アー|Method and system for comparing images|

US9727586B2|2012-10-10|2017-08-08|Samsung Electronics Co., Ltd.|Incremental visual query processing with holistic feature feedback|

US9594942B2|2012-10-11|2017-03-14|Open Text Corporation|Using a probabilistic model for detecting an object in visual data|

EP3100177A1|2014-01-30|2016-12-07|Huawei Technologies Co., Ltd.|Method for recognizing objects|

US20160012594A1|2014-07-10|2016-01-14|Ditto Labs, Inc.|Systems, Methods, And Devices For Image Matching And Object Recognition In Images Using Textures|

GB2529427B|2014-08-19|2021-12-08|Zebra Tech Corp|Processing query image data|

WO2016058626A1|2014-10-13|2016-04-21|Telecom Italia S.P.A.|Method and system for comparing video shots|

JP6541334B2|2014-11-05|2019-07-10|キヤノン株式会社|IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND PROGRAM|

CN105759605A|2015-12-15|2016-07-13|江南大学|Nonlinear system defect detection and positioning algorithm based on adaptive parameter model particle filter |

EP3398164B1|2015-12-30|2020-04-01|Telecom Italia S.p.A.|System for generating 3d images for image recognition based positioning|

US20170323149A1|2016-05-05|2017-11-09|International Business Machines Corporation|Rotation invariant object detection|

WO2017207015A1|2016-05-30|2017-12-07|The Graffter S.L.|Localization of planar objects in images bearing repetitive patterns|

US20180157681A1|2016-12-06|2018-06-07|Ebay Inc.|Anchored search|

US11003963B2|2016-12-27|2021-05-11|Telecom Italia S.P.A.|Method and system for identifying targets in scenes shot by a camera|

JP2020507836A|2017-01-02|2020-03-12|ガウスサージカル, インコーポレイテッドGauss Surgical, Inc.|Tracking surgical items that predicted duplicate imaging|

CN108305281B|2018-02-09|2020-08-11|深圳市商汤科技有限公司|Image calibration method, device, storage medium, program product and electronic equipment|

CN108596197B|2018-05-15|2020-08-25|汉王科技股份有限公司|Seal matching method and device|

CN109117854B|2018-07-25|2021-01-29|北京达佳互联信息技术有限公司|Key point matching method and device, electronic equipment and storage medium|

US10997232B2|2019-01-23|2021-05-04|Syracuse University|System and method for automated detection of figure element reuse|

CN110309815B|2019-07-11|2021-05-11|广州方硅信息技术有限公司|Method and system for processing face recognition data|

法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-08| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-02-09| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-03-09| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 09/03/2021, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

PCT/EP2011/050994|WO2012100819A1|2011-01-25|2011-01-25|Method and system for comparing images|

[返回顶部]